pg_xlog entries consuming large amount of disk space

Bug #1547238 reported by Francis Ginther
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
PostgreSQL Charm
Triaged
High
Unassigned
postgresql (Juju Charms Collection)
Triaged
High
Unassigned

Bug Description

I have a deployment with two postgresql units attached to a 5GB nova volume deployed with lp:charms/trusty/postgresql;revno=141. The volume filled up with pg_xlog files after running for about 6 days with very little database activity.

The only non-default options are:
        max_prepared_transactions: 500

I realize that the remaining default options lead to 'wal_keep_segments = 5000' which would consume 80GB of WAL files. However, it looks like something else may be happening and these files are remaining for longer then they should be? The issue here is that the disk filled up with nearly the default options. If there is a recommended disk size for the default settings, is it documented in the charm?

I'll attach the postgresql.conf files from each instance and the pg_xlog directory listings.

Revision history for this message
Francis Ginther (fginther) wrote :

Attaching tgz with postgresql.conf and directory listings.

Revision history for this message
Stuart Bishop (stub) wrote :

The setting is working as intended. wal_keep_segments=5000 does indeed tell PostgreSQL to keep 80GB of WAL files around, and of course you can't fit 80GB of files on a 5GB partition.

The setting is used as a replication buffer. If a standby lags behind, the master remains oblivious and will happily remove WAL files even if they haven't been sent to the standby. This causes replication to that standby to fail, and it needs to be rebuilt. Setting wal_keep_segments to 80GB provides an 80GB lag buffer to minimize the chances of this happening, and was chosen as it was the default used by the repmgr tool.

I agree this needs to be tuned better, and disk space recommendations documented. Also, we can take advantage of the replication slots feature available in 9.4 which makes this lag buffer dynamic.

I'll call this bug closed when:
 - wal_keep_segments defaults to 0 when there are no standbys
 - even when there are standbys, wal_keep_segments defaults to a smaller value.
 - Recommended partition size is documented in README.md

Changed in postgresql-charm:
status: New → Triaged
importance: Undecided → High
Changed in postgresql (Juju Charms Collection):
status: New → Triaged
importance: Undecided → High
Revision history for this message
Stuart Bishop (stub) wrote :

Workaround is to set wal_keep_segments to a lower value (0 is common) and wait up to 10 mins for PostgreSQL to cleanup.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.