postgresql-8.3 cluster locale should not be utf8

Bug #207779 reported by Torsten Krah on 2008-03-27
2
Affects Status Importance Assigned to Milestone
postgresql-common (Debian)
Fix Released
Unknown
postgresql-common (Ubuntu)
Medium
Martin Pitt

Bug Description

Binary package hint: postgresql-8.3

On Gutsy, the initdb command runs with locale=utf8.
I don't hesitate using utf8, but it prevents you from using any other charsets which are not compatible (e.g. latin1).

I think it would be better to use C or POSIX locale and as default encoding utf8, so you are still able to use other charsets than utf8 (initdb --locale=POSIX -E UTF8 ...)

Torsten

Martin Pitt (pitti) wrote :

IMHO UTF-8 is absolutely the right thing nowadays. But even if you do not like it, you are welcome to pg_dropcluster the existing instance, and create a new one (man pg_createcluster) using a different default encoding. Or just create a new database with a different encoding.

What do you mean with "prevents you from using other charsets"?

Changed in postgresql-8.3:
status: New → Invalid
Torsten Krah (tkrah) wrote :

Its not about not using utf8 as default for databases, but not for the locale.
If you use e.g. de_DE.UTF8 as locale and issue this command:

createdb XX -E LATIN1

you get:

encoding LATIN1 does not match server's locale de_DE.UTF-8
DETAIL: The server's LC_CTYPE setting requires encoding UTF8.

This is what i mean with prevents, you can't create another database on this cluster with latin1 charset like you suggested.

You are forced to utf8, even if you got some old databases which have to use latin1.
Using posix or c as locale does not show this problem.
I knew how to create a new one (take initdb script, set encoding, locale + path and its ready).
But upgrading from existing installations might fail here and behaves not like previous versions (look at debian bug report).

Found some similar debian thread:

http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=472930

Hope this helps.

Torsten Krah (tkrah) wrote :

reopened to discuss this further as upgrading existing databases may fail.

Changed in postgresql-8.3:
status: Invalid → New
  • unnamed Edit (189 bytes, application/pgp-signature; name="signature.asc")

For the record, this is the reproduction recipe:

setup:

sudo pg_createcluster 8.2 main --start
sudo -u postgres createdb -E latin1 latintest
sudo -u postgres createdb utf8test
sudo -u postgres psql -c "create table t(x varchar); insert into t values(E'A\xC3\xB6B');" utf8test
sudo -u postgres psql -c "create table t(x varchar); insert into t values(E'A\xF6B');" latintest

verify that 8.2 DBs have correct encoding:

$ psql -Atc 'select * from t' utf8test
AöB
$ psql -Atc 'select * from t' latintest | iconv -f latin1
AöB

upgrade to 8.3 which breaks:
sudo pg_upgradecluster 8.2 main

I am not yet quite clear how to work around this problem yet. My
preferred approach would be to create the 8.3 databases with a proper
encoding on upgrade and convert it on the fly. But at least I can put
above recipe into the postgresql-common test suite.

Martin

--
Martin Pitt | http://www.piware.de
Ubuntu Developer (www.ubuntu.com) | Debian Developer (www.debian.org)

Changed in postgresql-common:
importance: Undecided → Medium
status: New → Confirmed
Martin Pitt (pitti) wrote :

Fixed in trunk.

Changed in postgresql-common:
assignee: nobody → pitti
status: Confirmed → Fix Committed
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package postgresql-common - 87

---------------
postgresql-common (87) unstable; urgency=medium

  * Urgency medium since #472930 is an important bug fix.
  * debian/init.d-functions: If there are no clusters, exit with 4 (LSB-code
    for "unknown status") instead of 0 (which means "service is running", but
    it is debatable and confusing whether all clusters are running if there
    are none at all). (LP: #203966)
  * Update Spanish debconf translations, thanks Javier Fernández-Sanguino
    Peña. (Closes: #473405)
  * t/060_obsolete_confparams.t: Run upgrades under
    default_transaction_read_only=on. t/040_upgrade.t still uses the default
    "off", so both cases get tested. This replicates the problem report from
    Karsten Hilbert.
  * pg_upgradecluster: Work with default_transaction_read_only=on.
  * debian/autovacuum.conf, architecture.html: Point out that this file is
    only relevant for PostgreSQL versions earlier than 8.1. Thanks to Ross
    Boylan for pointing this out.
  * Add t/051_inconsistent_encoding_upgrade.t: Check that upgrades from
    pre-8.3 to 8.3 succeed and have correct encodings if the old DB had a
    database whose encoding did not match the server locale. This reproduces
    #472930.
  * pg_upgradecluster: Fix handling of database encodings on upgrade, since
    8.3 now forces DB encodings and server locale to match:
    - With C locale, keep encoding of DBs on upgrade, just as in previous
      versions. (C is compatible with all encodings, and causes lots of string
      functions not to work correctly, but people still use it deliberately.)
    - With other locales, create the target DB manually with a compatible
      encoding, and call pg_restore in a way to not create the target DB and
      automatically convert encoding.
    - Closes: #472930, LP: #207779

 -- Martin Pitt <email address hidden> Mon, 31 Mar 2008 14:15:25 +0100

Changed in postgresql-common:
status: Fix Committed → Fix Released
Changed in postgresql-common:
status: Unknown → Fix Released
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers

Bug attachments

Remote bug watches

Bug watches keep track of this bug in other bug trackers.