Postgres cannot startup after crashing

Bug #1634513 reported by Pavel
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
postgresql-common (Ubuntu)
New
Undecided
Unassigned

Bug Description

Ubuntu 15.10

Postgresql 9.5+175.pgdg15.10+1

postgresql-common 175.pgdg15.10+1

# How to reproduce

Execute 'echo b > /proc/sysrq-trigger' during postgres workload

After machine restart, systemd try to start cluster through pg_ctlcluster and failed

Log messages:

2016-10-18 15:22:50 MSK [5513-1] LOG: database system was interrupted; last known up at: 2016-10-18 15:08:50 MSK
2016-10-18 15:22:50 MSK [5513-2] LOG: database system was not properly shut down; automatic recovery in progress2016-10-18 15:22:50 MSK [5513-3] LOG: redo starts at A/ED186BA0
2016-10-18 15:22:50 MSK [5530-1] [н/д]@[н/д] LOG: incomplete startup packet
2016-10-18 15:22:51 MSK [5547-1] postgres@postgres FATAL: the database system is starting up
2016-10-18 15:22:51 MSK [5550-1] postgres@postgres FATAL: the database system is starting up
2016-10-18 15:22:52 MSK [5553-1] postgres@postgres FATAL: the database system is starting up
2016-10-18 15:22:52 MSK [5556-1] postgres@postgres FATAL: the database system is starting up
2016-10-18 15:22:53 MSK [5559-1] postgres@postgres FATAL: the database system is starting up
2016-10-18 15:22:53 MSK [5562-1] postgres@postgres FATAL: the database system is starting up
2016-10-18 15:22:54 MSK [5565-1] postgres@postgres FATAL: the database system is starting up
2016-10-18 15:22:54 MSK [5570-1] postgres@postgres FATAL: the database system is starting up
2016-10-18 15:22:55 MSK [5573-1] postgres@postgres FATAL: the database system is starting up
2016-10-18 15:22:55 MSK [5576-1] postgres@postgres FATAL: the database system is starting up
2016-10-18 15:22:56 MSK [5579-1] postgres@postgres FATAL: the database system is starting up
2016-10-18 15:22:56 MSK [5508-1] LOG: received smart shutdown request
2016-10-18 15:22:56 MSK [5580-1] LOG: shutting down
2016-10-18 15:22:56 MSK [5580-2] LOG: database system is shut down

# Why it is happens

pg_ctlcluster check cluster is running through psql

pg_ctlcluster contain func with name cluster_port_ready check:

  while ($n < ($result ? 10 : 3)) {
        select undef, undef, undef, 0.5;
        $out = `$psql -h '$sd' --port $p -l 2>&1 > /dev/null`;

        print STDERR "PSQL res: $out $?\n";

        if ($? == $result) {
            $n++;
        } else {
            $n = 0;
        }
        $result = $?;
    }

That func check error code after executing psql. Max 10 times with interval 0.5s, so 5s is maximum time to postmaster restoring after crashing. After that pg_ctlcluster return exit code 1 and systemd send SIGTERM to postgres.

But postmaster cannot accept any connection during restore procedure

postmaser.c:2164
                case CAC_STARTUP:
                        ereport(FATAL,
                                        (errcode(ERRCODE_CANNOT_CONNECT_NOW),
                                         errmsg("the database system is starting up")));
                        break;

# How to fix

Increase timeout ?

Check message during connect: FATAL: the database system is starting up ?

Determine state of recovery and wait when done ?

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.