Failed test 'default log is not used' with new procps 3.3.16-1

Bug #1864423 reported by Christian Ehrhardt 
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
postgresql-common (Ubuntu)
Fix Released
Undecided
Unassigned
procps (Ubuntu)
Invalid
Undecided
Steve Langasek

Bug Description

Since the new procps upload
 procps | 2:3.3.16-1ubuntu1 | focal-proposed | source, amd64, arm64, armhf, i386, ppc64el, s390x
the tests of postgresql-common fail.

The issues are:
autopkgtest [15:43:10]: test run-testsuite: - - - - - - - - - - results - - - - - - - - - -
run-testsuite FAIL non-zero exit status 2
autopkgtest [15:43:10]: test run-testsuite: - - - - - - - - - - stderr - - - - - - - - - -
# Failed test 'default log is not used'
# at ./t/020_create_sql_remove.t line 143.
# Failed test 'default log is not used'
# at ./t/020_create_sql_remove.t line 154.
# Looks like you failed 2 tests of 144.

It was retried a few times by different people now and reproduced so we can assume it is a real issues for now.

I was running two VMs to compare:
$ sudo ~/work/autopkgtest/autopkgtest/runner/autopkgtest --no-built-binaries --apt-upgrade --test-name=run-testsuite --apt-pocket=proposed=src:procps --shell postgresql-common_211.dsc -- qemu --qemu-options='-cpu host' --ram-size=2048 --cpus 2 ~/work/autopkgtest-focal-amd64.img
(and the other one without --apt-pocket=proposed=src:procps)

The issue reproduced locally in only the one with the new procps.

In Debian the latest similar run is
https://ci.debian.net/data/autopkgtest/testing/amd64/p/postgresql-common/4279031/log.gz
https://ci.debian.net/data/packages/testing/amd64/p/postgresql-common/4279031.log
But that is pre-3.3.16 and I wasn't able to find their result with the same combination.

Lets mark this bug update-excuse and check if we can find anything in these VMs.

tags: added: update-excuse
Revision history for this message
Christian Ehrhardt  (paelzer) wrote :
Download full text (4.4 KiB)

The failing tests are part of 020_create_sql_remove.t:

The checks are like:
  ok -z $default_log, "default log is not used";

And $default_log points to /var/log/postgresql/postgresql-12-main.log in both cases.

Entering interactive mode on fail:
$ sudo ./testsuite -f 20 -V -s

Eventually the whole cluster is cleaned, so the config and dirs are missing.
Changing the tests to find more details ...

The test sets a link:
/etc/postgresql/12/main/log -> /var/lib/postgresql/12/main/mylog
And then the instance is correctly using that log:
$ pg_lsclusters -h
12 main 5432 online postgres /var/lib/postgresql/12/main /var/lib/postgresql/12/main/mylog

But the old log "still" exists in the bad case and that breaks it.

log: Config >> symlink >> default path
The check wants to see the "default path" to be not existing - or actually not used.

It tries to clean the log via:
  is ((exec_as 'root', "pg_ctlcluster $v main stop"), 0, 'stopping cluster');
  open L, ">$default_log"; close L; # empty default log file

The problem seems to be non synchronous stop or a log file clean issue - the log still contains stuff after being cleaned.

ok 11 - correct output of pg_lsclusters -h
/usr/bin/ls: cannot access '/etc/postgresql/12/main/log': No such file or directory
-rw-r----- 1 postgres adm 559 Feb 24 09:23 /var/log/postgresql/postgresql-12-main.log
12 main 5432 online postgres /var/lib/postgresql/12/main /var/log/postgresql/postgresql-12-main.log
PRE STOP
2020-02-24 09:23:07.177 CET [16492] LOG: starting PostgreSQL 12.1 (Ubuntu 12.1-2build1) on x86_64-pc-linux-gnu, compiled by gcc (Ubuntu 9.2.1-25ubuntu1) 9.2.1 20200123, 64-bit
2020-02-24 09:23:07.177 CET [16492] LOG: listening on IPv4 address "127.0.0.1", port 5432
2020-02-24 09:23:07.177 CET [16492] LOG: listening on Unix socket "/var/run/postgresql/.s.PGSQL.5432"
2020-02-24 09:23:07.188 CET [16493] LOG: database system was shut down at 2020-02-24 09:23:06 CET
2020-02-24 09:23:07.192 CET [16492] LOG: database system is ready to accept connections
-- END --
ok 12 - stopping cluster
PRE CLEAN
2020-02-24 09:23:07.177 CET [16492] LOG: starting PostgreSQL 12.1 (Ubuntu 12.1-2build1) on x86_64-pc-linux-gnu, compiled by gcc (Ubuntu 9.2.1-25ubuntu1) 9.2.1 20200123, 64-bit
2020-02-24 09:23:07.177 CET [16492] LOG: listening on IPv4 address "127.0.0.1", port 5432
2020-02-24 09:23:07.177 CET [16492] LOG: listening on Unix socket "/var/run/postgresql/.s.PGSQL.5432"
2020-02-24 09:23:07.188 CET [16493] LOG: database system was shut down at 2020-02-24 09:23:06 CET
2020-02-24 09:23:07.192 CET [16492] LOG: database system is ready to accept connections
2020-02-24 09:23:09.669 CET [16492] LOG: received fast shutdown request
2020-02-24 09:23:09.669 CET [16492] LOG: aborting any active transactions
2020-02-24 09:23:09.671 CET [16492] LOG: background worker "logical replication launcher" (PID 16499) exited with exit code 1
2020-02-24 09:23:09.673 CET [16494] LOG: shutting down
2020-02-24 09:23:09.680 CET [16492] LOG: database system is shut down
-- END --
POST CLEAN
2020-02-24 09:23:07.177 CET [16492] LOG: starting PostgreSQL 12.1 (Ubuntu 12.1-2build1) on x86_64-pc-linux-gnu, compiled by gcc (Ubuntu 9.2.1-25ubuntu...

Read more...

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

Replacing the perl construct
 open L, ">$default_log"; close L; # empty default log file
with the also more readable:
 truncate "$default_log", 0;
Fixes the issue in postgresql-common.

That might be a fix for this particular test, but (unfortunately) the construct above is a common use-case in perl how did procps break that??

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

A trivial check like:
$ echo foo > testfile; perl -e 'my $test = "/home/ubuntu/testfile"; open L, ">$test"; close L;' cat testfile

Is not broken, it must be more special to the test env of postgresql-common ?!

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

I found that it depends on file permissions/ownership.
The construct:
  open L, ">$default_log"; close L; # empty default log file
only works if you are the file owner.

The test is running as root and the open/close will fail to open the postgres:adm opened file.

A simple test might look like:
use strict;
use warnings;

my $default_log = "/tmp/test";

system("rm $default_log");
system("echo 'SOME TEST CONTENT' > $default_log");
system("chmod 666 $default_log");
system("chown postgres:adm $default_log");
system("id");

system("/usr/bin/ls","-laF","$default_log");
system("echo","PRE");
system("/usr/bin/cat","$default_log");

open L, ">$default_log"; close L; # empty default log file

system("echo","POST");
system("/usr/bin/cat","$default_log");

I'm still missing the role of the new procps version in this, as the above is the same in both procps versions.

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

I see no difference between the simple test above and the status when the real test is running:

Bad:
-rw-r----- 1 postgres adm 559 Feb 24 10:55 /var/log/postgresql/postgresql-12-main.log
12 main 5432 online postgres /var/lib/postgresql/12/main /var/log/postgresql/postgresql-12-main.log

Good:
-rw-r----- 1 postgres adm 1594 Feb 24 10:56 /var/log/postgresql/postgresql-12-main.log
12 main 5432 online postgres /var/lib/postgresql/12/main /var/log/postgresql/postgresql-12-main.log

That is exactly the same (some file size difference) and works to clean in one, but fails in the other environment.

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

Submitted to Debian via:
https://salsa.debian.org/postgresql/postgresql-common/merge_requests/8

And if there are no other blockers Myon will do an upload soon, so we can sync and retest procps against that then.

Changed in postgresql-common (Ubuntu):
status: New → In Progress
Revision history for this message
Steve Langasek (vorlon) wrote :

The behavior change is because the new procps from Debian is setting an additional sysctl setting: ./debian/protect-links.conf:fs.protected_regular = 2

systemd in 20.04 was previously setting this to 1, which led to some issues, discussed at https://lists.ubuntu.com/archives/ubuntu-devel/2020-February/040904.html

No one appears to be asking for this to be set to 2, and we don't know how many other things will be broken by this change. We might want to revert this in procps.

Changed in procps (Ubuntu):
assignee: nobody → Steve Langasek (vorlon)
Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

FYI:
- postgresql-common itself had a flaky test on armhf (re-started) and should migrate soon (everything else is good)
- I triggered the procps tests with the new postgresql-common

@Steve - thanks for picking up the cleanup of the fs.protected_regular setting on procps itself

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

Tests are all good now, things should migrate once britney gets to them ...

Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package postgresql-common - 213

---------------
postgresql-common (213) unstable; urgency=medium

  [ Christian Ehrhardt ]
  * t/020_create_sql_remove.t: fix file clear with procps 3.16. (LP: #1864423)

  [ Christoph Berg ]
  * pg_lsclusters: List clusters even if the corresponding PostgreSQL
    binaries are missing; include "binaries_missing" in status column.
    (Salsa #13)
  * Reload systemd on install since we don't do that automatically via
    dh_installinit. Problem reported by velix, thanks! (Salsa #12)
  * debian/maintscripts-functions: Save full bin and lib dirs on catversion
    bump. Problem reported by Komzzpa, thanks! (Salsa #9)
  * debian/maintscripts-functions: Suppress errors from pg_controldata, the
    control file might be missing if the cluster was pg_upgraded. Spotted by
    Michael Banck, thanks! (Salsa #6)
  * pg_wrapper: Reword docs to better describe clusters on the local system.
    Reported by James Coleman, thanks! (Closes: #950149)
  * pg_wrapper: Document that `ALTER SYSTEM SET port` will interact badly with
    cluster selection. (Closes: #919385)

 -- Christoph Berg <email address hidden> Mon, 24 Feb 2020 16:20:14 +0100

Changed in postgresql-common (Ubuntu):
status: In Progress → Fix Released
Steve Langasek (vorlon)
Changed in procps (Ubuntu):
status: New → Invalid
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.