Migrando bases de dados LATIN1 para UTF-8 com o PostgreSQL

6 Feb

Nowadays it is a standard that all web sites and servers run at least with UTF-8 encoding as default, however you can find cases where it is LATIN1 or even ASCII.

The case we are dealing with today is a Debian server, version 5.0.3 (etch), which used encode LATIN1 as default encoding when installed and configured the Postgresql, and not let me create a database with UTF-8 encode, showing following error:

debian:~# su postgres -c 'createdb teste3  -E UTF-8'
createdb: database creation failed: ERROR:  encoding UTF8 does not match server's locale en_US
DETAIL:  The server's LC_CTYPE setting requires encoding LATIN1.

Why??

When Postgresql runs first time are created a cluster using default system encode, and after that, we can’t change cluster encoding. The default cluster are know as ‘main’.

In case of this Debian server, the default encode was configured to en_US, what for Debian means en_US.ISO-8859-1. By default in Debian Lenny, in my new instalations, system uses en_US.UTF-8 as default encoding, so, this server was probably an Debian Etch upgraded to Debian Lenny

The fix

After some research, I discovered that, after configure the server encode to use UTF-8, I can create a new cluster with UTF-8 support. See the walktrough:

  • Creating a new cluster: Create a new cluster is a very simple operation:
    debian:~# pg_createcluster 8.3 utf8_cluster
    Creating new cluster (configuration: /etc/postgresql/8.3/utf8_cluster, data: /var/lib/postgresql/8.3/utf8_cluster)...
    Moving configuration file /var/lib/postgresql/8.3/utf8_cluster/postgresql.conf to /etc/postgresql/8.3/utf8_cluster...
    Moving configuration file /var/lib/postgresql/8.3/utf8_cluster/pg_hba.conf to /etc/postgresql/8.3/utf8_cluster...
    Moving configuration file /var/lib/postgresql/8.3/utf8_cluster/pg_ident.conf to /etc/postgresql/8.3/utf8_cluster...
    Configuring postgresql.conf to use port 5433...
    
    debian:~# /etc/init.d/postgresql-8.3 restart
    

    The cluster runs on first free port after 5432, unless that you specify. Do a Postgresql restart to take effect.

  • Creating the user: Now, create the user to access database, I recommend that you use same username and password for user, and, set your authentication for trust on cluster’s pg_hba.conf:
    debian:~# su postgres -c "createuser -P pyuser -p 5433"
    Enter password for new role:
    Enter it again:
    Shall the new role be a superuser? (y/n) n
    Shall the new role be allowed to create databases? (y/n) n
    Shall the new role be allowed to create more new roles? (y/n) n
    
  • Populating new databases: Now you move databases to new cluster. First, you createdb and after run a pg_dump | psql to dump and populate database very fast, but be careful with ports:
    debian:~# su postgres -c "createdb teste3 -e -p 5433 -O pyuser"
    CREATE DATABASE teste3 OWNER pyuser;
    

    Now list the databases:

    debian:~# su postgres -c "psql -l -p 5433"
            List of databases
       Name    |  Owner   | Encoding
    -----------+----------+----------
     postgres  | postgres | UTF8
     template0 | postgres | UTF8
     template1 | postgres | UTF8	
     teste3    | pyuser   | UTF8
    (4 rows)
    

    After create database, dump and populate:

    debian:~# pg_dump -U pyuser teste3 | psql -U pyuser -p 5433 teste3
  • Configuring your application: That’s are one small part, just change settings of your application to connect on port 5433, or what you choose.

Do each step after creation of database for each database that you need to move, and have all working with UTF-8.

VN:F [1.9.22_1171]
Rating: 0.0/5 (0 votes cast)

4 Responses to “Migrando bases de dados LATIN1 para UTF-8 com o PostgreSQL”

  1. Eberson December 20, 2010 at %I:%M %p #

    Cara Muito boa sua dica, testei e cria o cluster 100%, mas o meu caso é assim so estou arrumando um servidor de dados em lenny o programador nao esta aqui portanto nao consiogo mudar a porta da aplicação 🙂 será que nao tem como mudar a configuração default do postgre pra usar a codificação utf8??? Se vc puder dar uma dica agradeceria…

    VA:F [1.9.22_1171]
    Rating: 0.0/5 (0 votes cast)
    • Felipe 'chronos' Prenholato December 21, 2010 at %I:%M %p #

      Oi Eberson, desculpa a demora para responder. Se eu não me engano (não to com um postgres agora para checar, pois mudei de trabalho) cada cluster tem um arquivo postgresql.conf. Você indo nestes arquivos e editando a configuração port, e depois restartando o postgresql, deve resolver o seu problema.

      VA:F [1.9.22_1171]
      Rating: 0.0/5 (0 votes cast)
  2. Marc PC1MH November 19, 2010 at %I:%M %p #

    Hi,
    Thanks for the help.
    I had the same problem and solved it in a slightly different way. I’d like to share with you all the following:

    If the database is in LATIN1 and the data is in UTF-8 then:

    createdb: database creation failed: ERROR: encoding UTF8 does not match server’s locale en_US
    DETAIL: The server’s LC_CTYPE setting requires encoding LATIN1.

    – purge postgres (apt-get purge postgres postgres-common)
    – check out /etc/locale.gen and add: en_US.UTF-8 UTF-8
    – Then run as root: locale-gen
    – Reinstall postgres and now all databases are in UTF-8

    VA:F [1.9.22_1171]
    Rating: 0.0/5 (0 votes cast)
  3. marc March 10, 2010 at %I:%M %p #

    Hello! Your post (Moving PostgreSQL databases from LATIN1 to UTF-8 – Chronosbox) does so well that I would like to translate it into French, publish on my french blog and link to you. You have something against it? Regards

    VA:F [1.9.22_1171]
    Rating: 0.0/5 (0 votes cast)

Leave a Reply